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DETAILED ACTION 

1. The Request for Continued Examination (RCE) filed on 4 August 2006 under 37 CFR 
1.53(d) based on parent Application No. 09/866,394 is acceptable and a RCE has been 
established. An action on the RCE follows. 

2. The amendments filed on 4 August 2006, submitted with the filing of the RCE have been 
received and entered. The applicant has added new claims 39-42. Claims 1-5, 7-15, 17-24, 26- 
33 and 35-42 as amended are pending in the application. 

Claim Rejections - 35 USC § 112 
The following is a quotation of the first paragraph of 35 U.S. C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of making 
and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it 
. pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode 
contemplated by the inventor of carrying out his invention. 

3. Claims 39-42 are rejected under 35 U.S.C. 1 12, first paragraph, as failing to comply with 
the written description requirement. The claim(s) contains subject matter which was not 
described in the specification in such a way as to reasonably convey to one skilled in the relevant 
art that the inventor(s), at the time the application was filed, had possession of the claimed 
invention. The specification on page 16, lines 21-22 describes the term "meaningful image" to 
refer to "a frame with a person's face, an important text, etc.". However, the specification does 
not mention that the meaningful image is a combination thereof of a person's face and important 
text. The specification merely mentions that the meaningful image can include other frames 
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("etc."), however, "etc." does not provide for a specific description of the meaningful image to 
be a combination of a person's face and important text. 

4. Furthermore, the specification does not provide for the Markush claims recited in claims 
39-42 ("consisting of 5 ). Markush claims must be provided with support in the disclosure for 
each member of the Markush group. As stated above, the specification does not provide 
adequate support for the Markush group member of "a combination thereof. The specification 
does not provide adequate description for the "group consisting of a person's face and important 
text and a combination thereof because the specification does not support every member of the 
recited group. See MPEP 608.01(p). 

The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the 
subject matter which the applicant regards as his invention. 

5. Claims 39-42 are rejected under 35 U.S.C. 1 12, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant regards as 
the invention: 

The term "a combination thereof is vague and indefinite. The specification does not 
describe the "meaningful image" as being able to consist of a combination of text and a person's 
face; the specification does not provide for an adequate description of "a combination thereof, 
therefore rendering the term vague and indefinite. 
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The term "important text" in claims 39-42 is a relative term which renders the claim 
indefinite. The term "important text" is not defined by the claim, the specification does not 
provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would 
not be reasonably apprised of the scope of the invention. The specification does not provide a 
standard or criteria for determining what text is considered important text and what text is not 
considered important text and therefore, it is indefinite. 

Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

6. Claims 1-5, 7-15, 17-24, 26-33 and 35^are rejected under 35 U.S.C. 103(a) as being 
unpatentable over the article entitled "Color SuperHistograms for Video Representation 55 , written 
by Dimitrova et al., and Wang et al. U.S. Patent 5,805,733. 

Referring to claims 1, 11,21 and 30, Dimitrova et al. teach an apparatus, system, method 
and computer executable instructions comprising a visual summary controller capable of creating 
a visual summary of video material (Dimitrova et al.: page 316, Figure 1), wherein the visual 
summary controller receives keyframes of the video material, extracts frame signatures from the 
keyframes to establish a plurality of family histograms and orders the plurality of family 
histograms to create respective superhistograms each including multiple family histograms 
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(keyframes from video material is extracted to create family histograms, which are then ordered 
to create superhistograms) (Dimitrova et al.: page 314, right column, lines 9-25, page 315, 
section 2 and page 316, section 2.3; this is further shown in Figure 1). However, although 
Dimitrova et al. teach using the frame signatures and superhistograms to create a visual summary 
of video material in a broad sense (representing video segments by computing superhistograms) 
(Dimitrova et al: Abstract), Dimitrova et al. fail to explicitly teach selecting representative 
keyframe images for each superhistogram to create a compact visual summary of the video 
material, wherein the representative keyframe images for each superhistogram include at least 
one of the first image in each family histogram, a randomly chosen image and an image that is 
closest to a center of each family histogram. Wang et al. teach the analysis of scenes and frames 
in video materials (Wang et al.: column 1, lines 53-56 and Figure 2) similar to that of Dimitrova 
et al. In addition, Wang et al. further teach selecting representative keyframe images for each 
superhistogram (a representative frame image is taken from each family histogram, i.e. set of 
related scenes; there are a plurality of sets of related scenes, which make up the superhistogram) 
(Wang et al.: column 1, lines 51-67, column 2, lines 1-24 and column 3, lines 37-65; this is 
further shown in Figure 3), wherein the representative keyframe images for each superhistogram 
include at least one of the first image in each family histogram, a randomly chosen image and an 
image that is closest to a cluster center of each family histogram (the representative frame 
images for the superhistogram, or the plurality of sets of related scenes; the representative frame 
for each set can be taken as the temporally medial scene in the set, i.e. the frame image that is 
closest to the center of the set, or family histogram) (Wang et al.: column 3, lines 37-66). It 
would have been obvious to one of ordinary skill in the art, having the teachings of Dimitrova et 
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al. and Wang et al. before him at the time the invention was made, to modify the visual summary 
controller capable of extracting frame signatures from keyframes to create superhistograms of 
Dimitrova et al., to include the further step of selecting representative keyframes from those 
superhistograms and using the representative keyframe images to create a compact visual 
summary, taught by Wang et al. One would have been motivated to make such a combination in 
order to meet the need of being able to readily access and manipulate video information, by 
cataloguing and storing the potentially thousands of hours of video for rapid future retrieval, 
browsing and use, created by the increasing availability and use of digital video and the 
increasing integration of computer technologies and video production technologies. 

Referring to claims 2, 12, 22 and 31, Dimitrova et al. teach the filtering of keyframes 
(merging of histograms into family histograms) and extracting frames signatures (computing 
color histograms) from the filtered keyframes before using the frame signatures (histograms) to 
create the superhistogram representing a visual summary of the video material (page 315, right 
column, section 2 and page 316, left column, section 2.3). 

Referring to claims 3, 13, 23 and 32, Dimitrova et al. teach the use of superhistograms to 
cluster the filtered keyframes (the ordered merging of the family histograms to create the 
superhistogram), wherein the clustered keyframes (superhistogram) represents the visual 
summary of the video material, as recited on page 314, right column, lines 1 1-25 and shown in 
Figure 1. 

Referring to claims 4 and 14, Dimitrova et al. teach the use of a histogram as the frame 
signature used to compute superhistograms (page 314, right column, lines 11-15). 
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Referring to claims 5, 15, 24 and 33, Dimitrova et al. the use of the LI distance measure 
method, L2 distance measure method, histogram intersection method, Chi-Square test and Bin- 
wise histogram intersection method to computer the histogram difference (page 315, right 
column). 

Referring to claims 7, 17, 26 and 35, Dimitrova et al. teach the ability to select the family 
histograms (the top n largest families) to use to create the superhistogram used to create the 
visual summary (page 316, section 2.4). 

Referring to claims 8, 18, 27 and 36, while Dimitrova et al. teach all of the limitations as 
applied to claims 1, 1 1, 21 and 30 above, Dimitrova et al. fail to explicitly teach the capability to 
retrieve a visual summary stored in a memory unit and causing the visual summary to be 
displayed in response to a user request. Wang et al. teach the analysis of scenes and frames in 
video materials (Wang et al.: column 1, lines 53-56 and Figure 2) similar to that of Dimitrova et 
al. In addition, Wang et al. further teach the capability of letting a user select a visual summary 
for viewing, retrieving that visual summary from memory and displaying it in response to the 
user's request (displaying visual summaries of scenes in a movie bar and allowing users to 
access the summaries by selecting the segments corresponding to the scenes) (Wang et al.: 
column 2, lines 16-29 and shown in Figures 2 and 3). It would have been obvious to one of 
ordinary skill in the art, having the teachings of Dimitrova et al. and Wang et al. before him at 
the time the invention was made, to modify the visual summary controller capable of extracting 
frame signatures from keyframes to create superhistograms of Dimitrova et al., to include the 
retrieval and display of the visual summary in response to a user request, as taught by Wang et 
al. One would have been motivated to make such a combination to give users the flexibility to 
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select which scenes to watch, saving them time from having to browse through all of the other 
irrelevant scenes; furthermore, because the increasing availability and use of digital video and 
the increasing integration of computer technologies and video production technologies have 
produced the need to be able to readily access and manipulate video information, it would have 
been advantageous to make such a combination in order to provide users a way to summarize the 
content of video quickly and easily, in order to catalogue and store the potentially thousands of 
hours of video for rapid future retrieval, browsing and use. 

Referring to claims 9, 19, 28 and 37, Dimitrova et al. teach the use of the visual summary 
obtained from the superhistograms to access at least a portion of the video material (classifying 
and searching in video archives), as recited on page 317, section 4.2. 

Referring to claim 10, 20, 29 and 38, while Dimitrova et al. teach all of the limitations as 
applied to claims 1, 11,21 and 30 above, Dimitrova et al. fail to explicitly teach the creation of 
new video material using the compact visual summaries. Wang et al. teach the analysis of scenes 
and frames in video materials (Wang et al.: column 1, lines 53-56 and Figure 2) similar to that of 
Dimitrova et al. In addition, Wang et al. further teach the creation of new video material using 
the compact visual summaries (a collage made up of representative frames for each set of 
summarized scenes) (Wang et al.: column 3, lines 53-57). It would have been obvious to one of 
ordinary skill in the art, having the teachings of Dimitrova et al. and Wang et al. before him at 
the time the invention was made, to modify the visual summary controller capable of extracting 
frame signatures from keyframes to create superhistograms of Dimitrova et al., to include the 
creation of new video material, as taught by Wang et al. It would have been advantageous for 
one to utilize such a combination in order to conserve processor time and storage space by 
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utilizing the already existing visual summaries in the creation of new visual materials; 
furthermore, because the increasing availability and use of digital video and the increasing 
integration of computer technologies and video production technologies have produced the need 
to be able to readily access and manipulate video information, it would have been advantageous 
to make such a combination in order to provide users a way to summarize the content of video 
quickly and easily, in order to catalogue and store the potentially thousands of hours of video for 
rapid future retrieval, browsing and use. 

Referring to claims 39-42, Dimitrova et al., as modified, teach the representative frame 
including at least one of the most meaningful image in each superhistogram (the longest scene, 
which is most indicative of the content of the related scenes) (Wang: column 3, lines 59-62) 
selected from the group consisting of a person's face and important text and a combination 
thereof (Wang: Figure 3 shows a plurality of frames, with frames that display images of people, 
including the person's face). 

Response to Arguments 

7. Applicant's arguments filed 4 August 2006 have been fully considered but they are not 
persuasive. 

8. The applicant states that each scene in Wang includes a plurality of frames or images and 
thus, correspond to a family of histograms, and that the set of scenes corresponds to a 
superhistogram, and therefore, Wang teaches that there is a single representative frame for each 
superhistogram instead of a representative frame from each family histogram of the 
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superhistogram. The examiner respectfully disagrees. Firstly, the examiner respectfully 
disagrees with the applicant's characterization of the Wang reference. Wang teaches that the 
scenes are grouped into sets of related scenes (column 3, lines 37-52); therefore, each set of 
related scenes, comprising a plurality of scenes, is a family histogram, and the plurality of sets of 
related scenes together make up a superhistogram. A representative keyframe is taken from each 
set of related scenes, i.e. from each family histogram (column 3, lines 53-63), and therefore, a 
plurality of keyframe images are taken for the superhistogram. Wang further teaches that the 
representative keyframe taken from a set of related scenes, i.e. the representative keyframe taken 
from the family histogram can be the medial scene in the set, i.e. center scene in the family 
histogram (column 3, lines 53-63). Since the keyframe image taken from family histogram can 
be the image that is closest to a center of the family histogram, there are a plurality of keyframe 
images for the superhistogram of the plurality of sets of related scenes that are each the image 
that is closest to a center of their respective family histograms. In view of the above arguments, 
the examiner respectfully argues that the combination of Dimitrova and Wang teaches the 
subject limitations. 

9. Furthermore, the applicant argues that Wang does not teach the recited limitation of the 
most significant frame including a person's face and/or an important text. The examiner 
respectfully disagrees. As shown in Figure 3 of Wang, the displayed frames show a plurality of 
images of people, the images including the showing of the person's face. 



Conclusion 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Ting Zhou whose telephone number is (571) 272-4058. The 
examiner can normally be reached on Monday - Friday 7:00 am - 4:30 pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, John Cabeca can be reached at (571) 272-4048. The fax phone number for the 
organization where this application or proceeding is assigned is (571) 273-8300. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
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